Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues
نویسندگان
چکیده
We describe the construction of the CODA corpus, a parallel corpus of monologues and expository dialogues. The dialogue part of the corpus consists of expository, i.e., information-delivering rather than dramatic, dialogues written by several acclaimed authors. The monologue part of the corpus is a paraphrase in monologue form of these dialogues by a human annotator. The corpus was constructed as a resource for extracting rules for automated generation of dialogue from monologue. Using authored dialogues allows us to analyse the techniques used by accomplished writers for presenting information in the form of dialogue. The dialogues are annotated with dialogue acts and the monologues with rhetorical structure. We developed annotation and translation guidelines together with a custom-developed tool for carrying out translation, alignment and annotation.
منابع مشابه
Generating Expository Dialogue from Monologue: Motivation, Corpus and Preliminary Rules
Generating expository dialogue from monologue is a task that poses an interesting and rewarding challenge for Natural Language Processing. This short paper has three aims: firstly, to motivate the importance of this task, both in terms of the benefits of expository dialogue as a way to present information and in terms of potential applications; secondly, to introduce a parallel corpus of monolo...
متن کاملThe Open University ’ s repository of research publications and other research outputs Question generation in the CODA project
In the ongoing CODA project, we are developing a system for automatically converting monologue into dialogue. The dialogue is generated in a two-step approach. Firstly, snippets of input monologue are mapped to dialogue act sequences. Secondly, these sequences are verbalized. The conversion relies partly on analysing input monologue in terms of its discourse relations. This short paper briefly ...
متن کاملQuestion Generation in the CODA project
In the ongoing CODA project, we are developing a system for automatically converting monologue into dialogue. The dialogue is generated in a two-step approach. Firstly, snippets of input monologue are mapped to dialogue act sequences. Secondly, these sequences are verbalized. The conversion relies partly on analysing input monologue in terms of its discourse relations. This short paper briefly ...
متن کاملSpeech Data Corpus for Verbal Intelligence Estimation
The goal of our research is the development of algorithms for automatic estimation of a person’s verbal intelligence based on the analysis of transcribed spoken utterances. In this paper we present the corpus of German native speakers’ monologues and dialogues about the same topics collected at the University of Ulm, Germany. The monologues were descriptions of two short films; the dialogues we...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010